Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization

نویسندگان

Thomas Desautels

Andreas Krause

Joel W. Burdick

چکیده

Can one parallelize complex exploration– exploitation tradeoffs? As an example, consider the problem of optimal highthroughput experimental design, where we wish to sequentially design batches of experiments in order to simultaneously learn a surrogate function mapping stimulus to response and identify the maximum of the function. We formalize the task as a multiarmed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel. We develop GP-BUCB, a principled algorithm for choosing batches, based on the GP-UCB algorithm for sequential GP optimization. We prove a surprising result; as compared to the sequential approach, the cumulative regret of the parallel algorithm only increases by a constant factor independent of the batch size B. Our results provide rigorous theoretical support for exploiting parallelism in Bayesian global optimization. We demonstrate the effectiveness of our approach on two real-world applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization

How can we take advantage of opportunities for experimental parallelization in explorationexploitation tradeoffs? In many experimental scenarios, it is often desirable to execute experiments simultaneously or in batches, rather than only performing one at a time. Additionally, observations may be both noisy and expensive. We introduce Gaussian Process Batch Upper Confidence Bound (GP-BUCB), an ...

متن کامل

Batched Gaussian Process Bandit Optimization via Determinantal Point Processes

Gaussian Process bandit optimization has emerged as a powerful tool for optimizing noisy black box functions. One example in machine learning is hyper-parameter optimization where each evaluation of the target function may require training a model which may involve days or even weeks of computation. Most methods for this so-called “Bayesian optimization” only allow sequential exploration of the...

متن کامل

Generalizing Policy Advice with Gaussian Process Bandits for Dynamic Skill Improvement

We present a ping-pong-playing robot that learns to improve its swings with human advice. Our method learns a reward function over the joint space of task and policy parameters T ×P , so the robot can explore policy space more intelligently in a way that trades off exploration vs. exploitation to maximize the total cumulative reward over time. Multimodal stochastic polices can also easily be le...

متن کامل

Optimization as Estimation with Gaussian Processes in Bandit Settings

Recently, there has been rising interest in Bayesian optimization – the optimization of an unknown function with assumptions usually expressed by a Gaussian Process (GP) prior. We study an optimization strategy that directly uses an estimate of the argmax of the function. This strategy offers both practical and theoretical advantages: no tradeoff parameter needs to be selected, and, moreover, w...

متن کامل

Safe Exploration for Optimization with Gaussian Processes(extended version with supplementary material)

We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples. This requires balancing exploration (learning about the objective) and exploitation (localizing the maximum), a problem well-studied in the multiarmed bandit literature. In many applications, however, we require that the sampled function values exceed some prespecified “...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization

نویسندگان

چکیده

منابع مشابه

Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization

Batched Gaussian Process Bandit Optimization via Determinantal Point Processes

Generalizing Policy Advice with Gaussian Process Bandits for Dynamic Skill Improvement

Optimization as Estimation with Gaussian Processes in Bandit Settings

Safe Exploration for Optimization with Gaussian Processes(extended version with supplementary material)

عنوان ژورنال:

اشتراک گذاری